Chapter 2 Duplicate Record Detection Using Anfis

نویسندگان

Felix Naumann

Jamie Callan

Hamid Haidarian Shahri

چکیده

The problem of duplicate detection is to find out whether the same real-world object is represented by two or more distinct entries in the database. Duplicate detection is otherwise known as Record linkage or record matching. It is a greatly researched topic and is of vital importance in fields such as master data management, data warehousing and ETL (Extraction, Transformation and Loading), customer relationship management, and data integration (Ahmed K. Elmagarmid et al 2007).

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chapter 3 Duplicate Record Detection Using Ga and Pso

The present chapter extends the research discussed in chapter 2 by handling the optimization algorithms. Moises G. de Carvalho et al (2011) have proposed a genetic programming approach to record deduplication. This approach automatically proposes duplicate record detection function by combining several pieces of evidence taken from the data. This function makes it possible to identify whether t...

متن کامل

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

Accuracy and validity of data are prerequisites of appropriate operations of any software system. Always there is possibility of occurring errors in data due to human and system faults. One of these errors is existence of duplicate records in data sources. Duplicate records refer to the same real world entity. There must be one of them in a data source, but for some reasons like aggregation of ...

متن کامل

TA-DRD: A Three-step Automatic Duplicate Record Detection

Duplicate record detection is a key step in Deep Web data integration, but the existing approaches do not adapt to its large-scale nature. In this paper, a three-step automatic approach is proposed for duplicate record detection in Deep Web. It firstly uses cluster ensemble to select initial training instance. Then it utilizes tri-training classification to construct classification model. Final...

متن کامل

PSO Algorithm to Select Subsets of Q-Gram Features for Record Duplicate Detection

Though data quality issues arise with ever-zooming quantity of data, it is a welcome sign that of late, significant improvement has been made in data engineering. Consequently, there have been significant investments from private and government organizations in developing methods for removing replicas from the data repositories. This phenomenon has caused a significant interest among researcher...

متن کامل

PC-Filter: A Robust Filtering Technique for Duplicate Record Detection in Large Databases

In this paper, we will propose PC-Filter (PC stands for Partition Comparison), a robust data filter for approximately duplicate record detection in large databases. PC-Filter distinguishes itself from all of existing methods by using the notion of partition in duplicate detection. It first sorts the whole database and splits the sorted database into a number of record partitions. The Partition ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2014

Chapter 2 Duplicate Record Detection Using Anfis

نویسندگان

چکیده

منابع مشابه

Chapter 3 Duplicate Record Detection Using Ga and Pso

A New Method for Duplicate Detection Using Hierarchical Clustering of Records

TA-DRD: A Three-step Automatic Duplicate Record Detection

PSO Algorithm to Select Subsets of Q-Gram Features for Record Duplicate Detection

PC-Filter: A Robust Filtering Technique for Duplicate Record Detection in Large Databases

عنوان ژورنال:

اشتراک گذاری